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Abstract —In this paper, we propose a missing spectrum data 
recovery technique for cognitive radio (CR) networks using 
Nonnegative Matrix Factorization (NMF). It is shown that the 
spectrum measurements collected from secondary users (SUs) 
can be factorized as product of a channel gain matrix times 
an activation matrix. Then, an NMF method with piecewise 
constant activation coefficients is introduced to analyze the 
measurements and estimate the missing spectrum data. The 
proposed optimization problem is solved by a Majorization- 
Minimization technique. The numerical simulation verifies that 
the proposed technique is able to accurately estimate the missing 
spectrum data in the presence of noise and fading. 

Index Terms —Nonnegative Matrix Factorization, Cognitive 
Radio Network, Spectrum Sensing, Missing Data Estimation 

I. Introduction 

Recent advances in wireless communications and micro¬ 
electronic devices are leading the trend of research toward 
cognitive radios (CRs) |Q] . The main feature of CRs is the op¬ 
portunistic usage of spectrum. CR systems try to improve the 
spectrum efficiency by using the spectrum holes in frequency, 
time, and space domains 111 |2D- This means that secondary 
users (SUs) are allowed to utilize the spectrum, provided that 
their transmissions do not interfere with the communication 
of primary users (PUs) JH]. The fundamental components of 
CR systems that allow them to avoid interference are spectrum 
sensing and resource allocation. 

However, in a practical CR network, spectrum occupancy 
measurements for all the frequency channels at all times are 
not available. This is partially because of energy limitations 
and network failures. Another highly important and very 
common reason for occurrence of missing entries in the data 
set is the hardware limitation. Each SU may want to use 
different frequency channels, but it may not be capable of 
sensing all the channels simultaneously [4, 5fa. On the other 
hand, a complete and reliable spectrum sensing data set is 
needed for a reliable resource allocation. Therefore, we need 
to develop a method to estimate the missing spectrum sensing 
measurements. This task is especially more challenging in 
dynamic environments. 

There are different approaches toward the problem of data 
analysis in the CR networks. In fl], a learning approach 
is introduced based on support vector machine (SYM) for 


spectrum sensing in multi-antenna cognitive radios. SVM 
classification techniques are applied to detect the presence 
of PUs. Several algorithms have been been proposed using 
dictionary learning framework [7, [§]. These approaches try to 
find the principal components of data using dictionary learning 
and exploit the components to extract information. 

The goal of this paper is to estimate the missing spectrum 
sensing data as accurate as possible in the time varying en¬ 
vironments. An approach is introduced based on Nonnegative 
Matrix Factorization (NMF) fid to represent the spectrum 
measurements as additive, not subtractive, combination of 
several components. Each component reflects signature of one 
PU, therefore the data can be factorized as the product of 
signatures matrix times an activation matrix. 

Dimension reduction is an inevitable pre-processing step 
for high dimensional data analysis 0 . NMF is a dimension 
reduction technique that has been employed in diverse fields 
Hi. The most important feature of NMF, which makes 
it distinguished from other component analysis methods, is 
the non-negativity constraint. Thus the original data can be 
represented as additive combination of its parts. 

In our proposed method, a new framework is introduced to 
decompose the spectrum measurements in CR networks using 
a piecewise constant NMF algorithm in presence of missing 
data. Piecewise constant NMF and its application in video 
structuring is introduced in & In the proposed method, we 
try to handle the missing entries in the data and also take 
a different approach to solve the corresponding optimization 
problem using an iterative reweighed technique. 

In the context of CR networks, NMF is utilized in [l3] to 
estimate the power spectra of the sources in a CR network 
by factorizing the Fourier Transform of the correlation matrix 
of the received signals. Our proposed method estimates the 
missing entries in power spectral density measurements by 
enforcing a temporal structure on the activity of the PUs and 
can be used in scenarios when the number of the PUs is not 
known. 

The introduced method takes advantage of a prior infor¬ 
mation about the activity of the PUs and exploits piecewise 
constant constraint to improve the performance of the factor¬ 
ization. Moreover, a solution for the introduced minimization 


problem is suggested using the Majorization-Minimization 
(MM) framework. 

The rest of the paper is organized in the following order. 
In Section [Til the system model and the problem statement 
are introduced. Section [Till describes the proposed new NMF 
problem. In Section [iVl a method is presented to solve the 
piecewise constant NMF problem in MM framework with 
missing data. Section El presents the simulation results and 
finally Section PvTl draws conclusions. 

II. System Model 

Due to the nature of wireless environments, trustworthy 
information cannot be extracted from measurements of a single 
SU. To find the spectrum holes in frequency, time, and space, 
there exists a fusion center that collects and combines the 
measurements from all the SUs 0 ]. Cooperative spectrum 
sensing makes the missing data estimation algorithm more 
robust. Fusion center predicts the missing entries by using 
the collected measurements.However, since each SU is not 
able to sense the whole spectrum all the time, the data set 
collected from the SUs contains missing entries. Network 
failures, energy limitations, and shadowing can also cause loss 
of data. 

Without loss of generality, we want to reconstruct the power 
map in a single frequency band. The network consists of 
Np primary transmitters and Nr spectrum sensors that are 
randomly spread over the entire area of interest. Figure Q] 
illustrates an example of a network with Np = 2 PUs and 
Nr = 10 SUs in a 100 x 100 area. 
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Fig. 1: The power distribution of 2 PUs (squares) and deployment 
of 10 SUs (triangles) without considering shadowing effect. 


The received power of the r th sensor at time t can be written 


as 


Np 


Sr(t ) = + z r(t), 

3 = 1 


( 1 ) 


where pj ( t ) is the transmit-power of the j th PU at time t, y r j 
is the channel gain from the j th PU to the r th SU, and z r (t) is 
the zero-mean Gaussian measurement noise at the r th sensor 


with variance of- Considering a Rayleigh fading model, the 
channel gain coefficient can be modeled as: 


_C\h rj \ 2 
lrj ~ d r R 


( 2 ) 


where the channel constant C = , / is the carrier 

frequency, c is the speed of light, and Gp and Gr are the 
transmitter and receiver antenna gains, a is the path loss 
exponent which determines the rate at which power decays 
with the separation distance d r j between the r th SU and the 
jth ppj an( j \h r j \ 2 models the fading effect. 

At time slot t, measurements from SUs can be stacked in a 
vector s(t ), given as 


N P 

s (t) = + z(t), (3) 

j=i 

where s(t) = [ si(t) s 2 (t) ... SN R (t) ] T , 

7j(t)=[ 7ij(*) 72 jit) ... lN Rj (t)] T , and 

z(t) = [ zi(t) z 2 (t) ... ZN R (t) ] T . At each time slot, 
only a few SUs observe the power levels and report them to 
the fusion center. Therefore the vector s(t) contains some 
missing entries. Furthermore, each PU can be active or 
inactive in each time slot. 

Some of the characteristics of the environment can be 
exploited to simplify the problem. It is assumed that channel 
gains are slowly time varying such that they can be considered 
as constant in a time window. Therefore, matrix representation 
of ([3]) can be written as: 


S = VP + Z, (4) 

where S is an Nr x T matrix, which includes measurements 
from sensors in T time slots, T = [71,..., 7a/>] is a Nr x Np 
matrix, which consists of Np channel gain vectors in the Nr- 
dimensional space of data, and P is an Np x T matrix that 
indicates the power levels of PUs in each time slot ( pj t = 0 
if the j th PU is inactive at time t). 

Here, the goal is to estimate the missing data using the par¬ 
tial observations. To achieve this goal, the data is decomposed 
using piecewise constant NMF. Then the components of data 
and the activation matrix are used to estimate the missing data. 


III. PC-NMF: Piecewise Constant Nonnegative 
Matrix Factorization 

Promoted by 0, it is easy to see that the measurements of 
each time slot can be represented as an additive, not subtrac¬ 
tive, combination of few vectors. This algebraic representation 
has a geometric interpretation. Figure [2| helps us to visualize 
the structure of data in a 3-dimensional space of data. In 
this figure, 3 SUs are measuring power levels in an area 
with 3 PUs. It is easy to notice that measurement vectors lie 
within a pyramid in the positive orthant with Np = 3 edges 
proportional to 7 j. This is due to fact that all the points in 
the pyramid can be written as an additive combination of the 
edges. 
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Fig. 2: Structure of data generated by Np = 3 PUs in Nr = 3 
dimensional space. 


Although it is assumed that the channel gains are sta¬ 
tionary for a time window of length T, PUs can become 
activated/deactivated in this time window any number of times 
and can change their transmission power in each activation. 
Hence, the power levels of PUs tend to be piecewise constant. 

NMF is a widely-used technique to decompose data to its 
nonnegative components. Here, the structure of the power level 
matrix P is exploited while handling the missing entries. As 
a result, the general objective function is presented as follows: 


minimize Dw(S\TP) + f3F(P ), 
subject to r > 0, P > 0, 


( 5 ) 


where Dw{S\TP) is a weighted measure of fit and F(P) 
is a penalty, which favors piecewise constant solutions. /3 is 
a nonnegative scalar weighting the penalty. The constraints 
denote that all the entries of T and P are nonnegative. W 
is an Nr x T weight matrix that is used to estimate the 
weighted distance between S and TP. The coefficients of the 
weight matrix denote the presence of data (w rt = 0 /w rt = 1 
if the measurement of the r th SU at time slot t is unavail¬ 
able/available). 

NMF algorithms utilize different measures of fit such as 
Euclidean distance, generalized Kullback-Leibler (KL) diver¬ 
gence, and the Itakura-Saito divergence . In all the cases, the 
distance can be calculated as the sum of the distances between 
different coefficients EMI. 


T Np Np 

D W (S\TP) = Y y ,w rt d(s rt \ T, 7 rjPjt), (6) 

t= 1 r=1 j=1 


In our case, Euclidean Distance is used as the measure of 
fit. This objective function is commonly used for problems 
with Gaussian noise model, a common noise model in com¬ 
munication systems, hence: 

Np i N P 

d ( s rt\ Y 7rjPjt ) = ^(S rt - XAift'i) 2 - (7) 

3 =1 3 =1 

Since there exist sharp transitions in power level of PUs 
and power level of each PU is constant in each transmission 
period, rows of P tend to be piecewise constant. In order to 
favor the piecewise constant solutions, the penalty function is 
defined as: 

T N P 

F ( p ) = YY \ pjt ~ i”- ( § ) 

t =2 j =1 U 

When n tends to 0, this penalty function represents the sum 
of £q norm of the transition vectors, i.e. p t — P(t-i)> where p t 
is an Np x 1 vector containing power levels of PUs in time 
t. This penalty favors the solutions with a lower number of 
transitions. However, since it is not differentiable, it can be 
replaced with a differentiable approximation: 

T N P 

Fe(P) = Pe ( Pjt ~ Pj(t — l))i 

t=2 j=l (9) 

X 2 

with p e (x) = , 

x z + e z 

where e 2 is a small positive constant and is much less than 
all the non-zero elements of (pjt — Pj(t-i)) 2 Vj ,t to avoid 
division by zero. 

In Section [I^ an algorithm is derived to find the minimizer 
of the following problem: 

minimize D W (S\TP) + /3F e (P ), 

rp ( 10 ) 

subject to r > 0, P > 0. 

After estimating P and T, the missing entries of S can be 
approximated using the equation S ^ TP. 


IV. Majorization-Minimization for Piecewise 
Constant NMF 

In this section, an iterative algorithm is described to find 
the solution of the optimization problem proposed in (ITOl) . 
For that, Majorization-Minimization (MM) framework is em¬ 
ployed 116, . MM algorithm and its variants have been used 

in various applications such as parameter learning and image 
processing U3, 123] • The update rules are derived to calculate 
the entries of P given the entries of T and then the entries of T 
given the entries of P, using an iterative reweighed algorithm. 

First, the update rules for P given T are derived. Then, the 
update rules for T will be derived in a similar manner. 

As it is clear in ©, we can write the distance measure as 
a sum of different time slots: 


D w (S\TP) = YC(Pt), 


(ID 


t= 1 
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where C(p t ) is the weighted Euclidean distance between s t 
and r Pt , given s t and T. In MM framework, the update 
rules are derived by minimizing an auxiliary function 10 . 
By definition, G(p t ,p t ) is an auxiliary function of C(p t ) if 
and only if G(p t ,p t ) > C(p t ) and G(p t ,p t ) = C(p t ) for 
\/p t . If G(p t ,p t ) is chosen such that it is easier to minimize, 
the optimization of C(p t ) can be replaced with iterative 
minimization of G(p t ,p t ) over p t . Thus, in the literature, 
convex functions are frequently used as the auxiliary functions 
112(3- H is shown in [ 16] that C(p t ) is non-increasing under 
the update 

p) +l = argmin G{p t ,p\). (12 ) 

Pt 


This is due to the fact that in the i th iteration we have 

C(p\ +1 ) < G(p\+\p\) < G(pj,pjl = C(p\). 

Following a similar approach as 111611 . the auxiliary function 
for the weighted Euclidean distance can be formulated as: 

G{p t ,Pt) = C{pl)+(p t -p l t )VC(pl)+hp t -pl) T K(pl)(p t -p l t ), 

(13) 

where K(p\) is an Np x Np diagonal matrix with 


kjjipi) = 

p jt 

qi = T T (w t @Tpl), 


(14) 


and kjj is the j th diagonal entry of K(p z t ) and Q is element¬ 
wise multiplication. 

To solve the problem presented in (flOl) . the contribution of 
F e (P) should be considered in the auxiliary function. For that, 
a convex version of F e (P) is employed: 


T N P 

Vjtipjt - Pj(t- 1)) 2 , 

t=2 j =1 

with y jt 


(P 


jt 


_ p- L > \2 , , 


(15) 


Now the update rules can be obtained using the iterative 
version of (fl5l ) . This means that yjt is updated in each iteration 
using the values of P in the previous iteration. 

To form the penalized auxiliary function, Gp(p t ,p\), we 
add up G{p t ,p\) with the contribution of p t to F(P). Thus, 
GpiPtiPt) can written as: 


Gp(p t ,Pt ) = G(p t ,p\) 

N P 

+p(52vjt(pjt - pj(t~ i)) 2 

3 =1 

N P 

+ Vite+l) (p)(t+l) ~ Pjt) ]• (16) 

3 = 1 

It is worthwhile to mention that yj\ = yj(r+ 1 ) = 0 Vj. 
Since Gp(p t ,p l t ) is convex, it can be easily minimized over 
p t by setting the gradient to zero. Hence the update rule is 


attained as: 


= - V i C (p|) +P i jt kjj(p i t ) + 2+ 2/3y* (t+1) pj (t+1) 
Pjt ■' fe«(f»i) + 2^j t + 2^ (t+1) 

vc(pj) = -r T («;t 0 st — wt © (r P j)), 

(!7) 

where VjC(p^) is the element of the gradient VC (pi). 

Finding the update rule for T is simple. This is due to the 
fact that F(P) is not a function of T. Hence, the update rule 
for T is similar to the update rule for standard NMF, except the 
missing entries must be taken into account [21]. The update 
rules can be written in matrix form as: 


-\Z+1 


= r z © 


(W(DS)P t 

(TF©(r i p))p 7 


(18) 


where © is the element-wise multiplication and the division 
is also performed in an element-wise manner. 

The obtained update rules in ([171) and (fl 8 l ) are exploited 
alternatively to estimate T and P. Then the missing entries of 
S are predicted by S = TP. 

However, by using the objective function in (flOl) . the 
optimization problem results in solutions with entries of P 
tend toward 0 and ||r|| tends toward oo. We take advantage 
of the scale ambiguity between T and P to avoid this issue. 
Fet A be a diagonal Np x Np matrix with its j th diagonal 
entry equal to 11 -y ■ 112 - In each iteration, the rescaled matrix 
pair (TA _1 , AP) is used instead of the original matrix pair 

(i\n 

As a practical scenario, we should also consider the case 
when the secondary network has no information about the 
number of the PUs, i.e. Np. In this case, the common 
dimension of matrices T and P is not known. There have 
been some efforts in model order selection in NMF 0 . In 
the numerical experiments, K > Np is used as the common 
dimension to factorize the data in such conditions. This is only 
possible if the secondary network has some information about 
the upper bound of Np. 


V. Numerical Results 

For the numerical experiments, one frequency channel is 
considered with N p = 3 active PUs in the area. Figure [3] illus¬ 
trates the topology of the network. Incomplete measurements 
are collected from Np = 20 SUs. 

We use the same simulation environment and the same 
network topology as in [d]. The simulation parameters are 
set as follows, unless otherwise is stated. The path loss is 
computed as (j^) a , where d is the distance, do = 0 . 01 , and 
a = 2.5. 7 r j is computed by multiplying the pathloss by the 
fading coefficient \h r j \ 2 where 


h r j(t) = rjh r j(t - 1) + y/\ — f] 2 v rj (t ), (19) 

77 = 0.9995, and v r j (t) is circulary symmetric zero mean 
complex Gaussian noise with variance 1 [8]. 

PUs’ activity is modeled by a first order Markov model. 
All the PUs utilize the spectrum A = 0.3 of the time slots. 

Transition matrix of the j th PU is ( 1 h aj ] and 
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Fig. 3: Network topology consisting of 3 PUs and Nr = 20 SUs 
marked by triangles. 

A = 1, . dj is the probability that the j th PU stops 

transmitting from time t — 1 to t and bj is the probability that 
the PU activates transmitting. The parameter a 3 is uniformly 
distributed over [0.05,0.15]. 

Each time a PU becomes activated, it chooses the trans¬ 
mission power from a uniform distribution with support 
[100,200]. Each SU makes a measurement with 70% of 
chance. The measurements are contaminated by additive white 
Gaussian noise. The noise variance is 10 -5 for all the SUs. 

Partial measurements are generated for T = 600 time slots. 
To reduce the computational burden, the first 300 time slots 
are used to estimate T. Next, by using the obtained T and the 
update rule (fT71) . P is estimated for all 600 time slots. The 
regularization factor /3 is set to 5 x 10 -3 and K = 5 factors 
are used to factorize the data. 

Figure 0] shows the true power levels and the reconstructed 
one at a randomly selected SU versus time for the time window 
of T = 300 samples. It can be seen that the missing entries 
are accurately recovered through the proposed method, and it 
is evident that the proposed algorithm can easily track abrupt 
changes in power level. 



Figure \5\ compares the RMSE, averaged over SUs, of the 
proposed method with two similar methods. The method 
introduced in H] exploits the spatial correlation between 
adjacent SUs’ measurements and semi-supervised dictionary 


learning (SS-DL) to estimate the missing entries. For the 
numerical results, the batch version of SS-DL is employed and 
the parameters are set to their optimal values. Furthermore, 
to emphasize the effect of the piecewise constant penalty, 
the results are also compared with the weighted NMF, i.e. 
WNMF ll2ll l22ll. WNMF employs binary weight matrix to deal 
with the missing entries. This figure shows that the proposed 
method outperforms its competitors in different noise levels 
(Figure [5j (a)) and different probabilities of miss (Figure [3(b)). 
Pmiss denotes the ratio of the missing entries among the 
spectrum data. 




Fig. 5: Performance of the proposed method for different noise levels 
and probability of miss, averaged over 200 Monte Carlo trials. 


This figure shows that WNMF and SS-DL almost perform 
the same for low noise variance and low Pmiss • However, 
for harsh environments with high noise variance or high 
Pmiss , SS-DL produces more accurate results. The PC-NMF 
method outperforms both methods in different noise levels 
and different probabilities of miss. For instance, PC-NMF 
has 11.6% less RMSE compared to the SS-DL method for 
°noise = 10 5 an( ^ Pmiss =0.3. 

This improvement in the performance does not increase 
the computational burden of the algorithm. Table U shows the 
running time£] for different methods averaged over 100 Monte 


Carlo trials for cH 0 - se = 10 


-5 


= 0.3, and K — 5. 


'All simulations have been performed under MATLAB 2014a environment 
on a PC equipped with Intel Xeon E5-1650 processor (3.20 GHz) and 8 GB 
of RAM. 
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TABLE I: Average Running Time 


Method 

Average Running Time (s) 

SS-DL 

10.3039 

WNMF 

0.0944 

PC-NMF 

0.0952 


It is known that the NMF methods converge much faster 
than methods based on gradient descent 12 lh . However, Table 
U also illustrates that the proposed method does not require 
more computational resources compared with WNMF. 

To study the effect of the piecewise constant penalty on the 
output of the algorithm, Figure [6] depicts the power level of two 
PUs and the estimated activation levels using the introduced 
method and WNMF. Both methods can estimate the power 
levels up to a scale factor. The number of factors is set to 3, 
i.e. K = N P . 


Original Power Levels 



Time Time 


Estimated Activation Levels with PC-NMF /5 = 10 2 



Time Time 


Estimated Activation Levels with WNMF 



Time Time 


Fig. 6: Original power levels of different PUs and the estimated 
activation levels with PC-NMF and WNMF. 

This figure illustrates the fact that the proposed method 
produces a more accurate factorization by taking advantage of 
piecewise constant constraint as a prior information. As it was 
expected, power levels estimated by PC-NMF are piecewise 
constant, while the results generated by WNMF are noisy. 
In fact, the piecewise constant penalty decreases the effect of 
noise and fading. Moreover, the sharp transitions are preserved 
in the factors returned by PC-NMF. 

VI. Conclusions 

By exploiting inherent structural feature of cognitive ra¬ 
dio networks, we proposed a piecewise constant NMF ap¬ 
proach that can decompose the data set into its components. 
Majorization-Minimization framework is utilized to solve the 
optimization problem of the piecewise constant NMF. Numer¬ 
ical simulations suggest that this method is able to predict the 
missing entries in the spectrum sensing database accurately. 
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